# Processed drosophila locomotion dataset used in CDC-FM paper.

This dataset contains processed drosophila locomotion data derived from the dataset published in: 
Brian D DeAngelis, Jacob A Zavatone-Veth, Damon A Clark (2019) The manifold structure of limb coordination in walking Drosophila eLife 8:e46409 (https://doi.org/10.7554/eLife.46409). 

The processed dataset is used in: 
Bamberger, J., Jones, I., Duncan, D., Bronstein, M. M., Vandergheynst, P., and Gosztolai, A. Carré du champ flow matching: better quality–generalisation tradeoff in generative models ICLR 2026 (https://arxiv.org/pdf/2510.05930).

`drosophila_data.h5` contains two fields:
- `L`: limb-movement data [mm]; array of shape `(31, 12, 100000)` = (# frames per clip at 150 Hz, 6 limbs × 2 (x/y coords), # datapoints).
- `dPhi`: phase-frequency data [Hz]; array of shape `(31, 6, 100000)` = (# frames per clip at 150 Hz, 6 limbs, # datapoints).

## How to recreate the processed dataset `drosophila_data.h5`: 
1. Download the raw data from the original work (DeAngelis et al.)
    - Go to: https://datadryad.org/dataset/doi:10.5061/dryad.3p9h20r
    - Download the file `20181025_20180530-20180614_IsoD1_Glass_MaskedModel_1000PCs_amplitude_phase_down_downcam_Steps(_down_cam).mat` (2.34 GB)
2. Rerun the analysis code `MakeUMAPFigures.m` from DeAngelis et al. and save intermediate data 
    - Clone the MATLAB-based repository (DeAngelis et al.): https://github.com/ClarkLabCode/GaitPaperCode
    - Replace `AnalysisUtilities/PrepareDataForUMAP.m` with the `PrepareDataForUMAP.m` provided here. 
    - Move the raw data file (`20181025_<...>.mat`) to the repository root. 
    - Run MATLAB script `MakeUMAPFigures.m` from repository root. 
    - This creates `drosophila_data.h5` in repository root. 